On the complexity of finding common approximate substrings

نویسندگان

  • Patricia A. Evans
  • Andrew D. Smith
  • Todd Wareham
چکیده

Problems associated with #nding strings that are within a speci#ed Hamming distance of a given set of strings occur in several disciplines. In this paper, we use techniques from parameterized complexity to assess non-polynomial time algorithmic options and complexity for the COMMON APPROXIMATE SUBSTRING (CAS) problem. Our analyses indicate under which parameter restrictions useful algorithms are possible, and include both class membership and parameterized reductions to prove class hardness. In order to achieve #xed-parameter tractability, either a #xed string length or both a #xed size alphabet and #xed substring length are su7cient. Fixing either the string length or the alphabet size and Hamming distance is shown to be necessary, unless W [1] = FPT . An assortment of parameterized class membership and hardness results cover all other parameterized variants, showing in particular the e<ect of #xing the number of strings. c © 2003 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient algorithms for the longest common subsequence in $k$-length substrings

Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of...

متن کامل

Efficient Enumeration of Phylogenetically Informative Substrings

We study the problem of enumerating substrings that are common amongst genomes that share evolutionary descent. For example, one might want to enumerate all identical (therefore conserved) substrings that are shared between all mammals and not found in non-mammals. Such collection of substrings may be used to identify conserved subsequences or to construct sets of identifying substrings for bra...

متن کامل

On the computational complexity of finding a minimal basis for the guess and determine attack

Guess-and-determine attack is one of the general attacks on stream ciphers. It is a common cryptanalysis tool for evaluating security of stream ciphers. The effectiveness of this attack is based on the number of unknown bits which will be guessed by the attacker to break the cryptosystem. In this work, we present a relation between the minimum numbers of the guessed bits and uniquely restricted...

متن کامل

An Efficient Algorithm for Finding Similar Short Substrings from Large Scale String Data

Finding similar substrings/substructures is a central task in analyzing huge amounts of string data such as genome sequences, web documents, log data, etc. In the sense of complexity theory, the existence of polynomial time algorithms for such problems is usually trivial since the number of substrings is bounded by the square of their lengths. However, straightforward algorithms do not work for...

متن کامل

Suffix Trees and Suffix Arrays

Iowa State University 1.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Linear Time Construction Algorithms . . . . . . . . . . . . . 1-4 Suffix Trees vs. Suffix Arrays • Linear Time Construction of Suffix Trees • Linear Time Construction of Suffix Arrays • Space Issues 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 306  شماره 

صفحات  -

تاریخ انتشار 2003